Bubble Clustering: Set Covering via Union of Ellipses

نویسندگان

  • Matt Kraning
  • Arezou Keshavarz
  • Lei Zhao
چکیده

We develop an algorithm called bubble clustering which attempts to cover a given set of points with the union of k ellipses of minimum total volume. This algorithm operates by splitting and merging ellipses according to an annealing schedule and does not assume any prior distribution on the data. We compare our algorithm with k-means and the EM algorithm for mixture of Gaussians. Numerical results suggest that our algorithm achieves superior performance to k-means for priorless data and comparable performance to algorithms that have access to prior information for statistically generated data. I. PROJECT GOAL We want to cover a set of N points X = {x1, x2, . . . , xN}, xi ∈ R, by the union of k ellipses (E1, E2, . . . , Ek) such that the total volume of all ellipses is minimized. Unlike mixture of Gaussians, we do not assume a prior distribution on data, nor do we even assume the data has a prior distribution. Consequently, the optimization problem we are attempting to solve is minimize ∑k i=1 log detA −1 i subject to ‖AIjxj + bIj‖2 ≤ 1 Ij ∈ {1, . . . , k}, j = 1 . . . N Ai ∈ S n ++, i = 1 . . . k, which is an NP-Hard problem due to the presence of the constraint Ij ∈ {1, . . . , k}; this constraint specifics that we find the optimal assignment of every given data point xj to a specific ellipse EIj . II. ALGORITHM Because solving our problem exactly is NP-Hard, we instead focus on developing an approximate algorithm to solve this problem. Rather than use EM, whose performance can depend heavily on initial starting conditions that are usually randomly generated, we sought an algorithm that is both deterministic and as free of tunable parameters as possible. This was done to maximize the algorithm’s universality: it should not depend on random numbers, nor should it have to be highly ‘tuned’ to work well on specific datasets. Bubble clustering works by iteratively splitting minimum volume ellipses that were fit to parts of the data on earlier iterations. The intuition behind bubble clustering is that good clustering performance can be obtained by looking at a global summary of part of the dataset (the points within a given ellipse) and then taking a lower volume covering of that same set by splitting the ellipse containing that set into two smaller ellipses. After splitting down to k ellipses, bubble clustering then performs a variant of simulated annealing to escape from weak local minima. The annealing schedule controls if ellipses are split or merged during each iteration and guarantees that the number of ellipses converges to the pre-specified value k. This is described by the pseudocode below and a more detailed description of how splitting and merging of ellipses is performed is given in the following sections. Given data X, number of final clusters k, and an annealing schedule, anneal, fit a single ellipse to the data. Then, for t = 1 to k do split ellipse end for for i = 1 to length(anneal) do if anneal(i) == split then split ellipse else if anneal(i) == merge then if one ellipse contained inside another then eliminate smaller ellipse else merge two ellipses end if end if end for A. Splitting The density of the ith ellipse, ρi, is the ratio of the volume of the ellipse to the number of data points inside the ellipse, i.e. ρi = vol(Ei) |{i : 1 ≤ i ≤ κ, xi ∈ Ei}| . In the splitting phase of a given iteration with κ ellipses, bubble clustering greedily chooses the ellipse with lowest density, EI , where I = argmin i∈{1,...,κ} ρi, and splits it into two ellipses EI1 , EI2 . It splits EI along the hyperplane perpendicular to the direction which gives maximum data variance for data points within EI . This direction is the eigenvector corrosponding to the maximum eigenvalue for the empirical covariance matrix of the data points inside EI . After splitting, it fits new minimum-volume ellipses around both sets of points by solving the convex optimization problem minimize log detA Ij subject to ‖AIjxi + bIj‖2 ≤ xi, xi ∈ EIj for j = 1, 2. Fig. 1. Splitting along the eigenvector of maximum eigenvalue for the empirical covariance matrix of the data point inside the chosen ellipse. B. Merging Ellipses In the merging step, we first check if there is any ellipse that is completely contained in another ellipse. Denote the ellipses by Ei = {x|‖Aix− bi‖22 ≤ 1} and Ej = {x|‖Ajx− bj‖ 2 2 ≤ 1}. We want to check if Ei ⊆ Ej , or equivalently, whether ‖Aix− bi‖22 ≤ 1 ⇒ ‖Ajx− bj‖ 2 2 ≤ 1. The S-Procedure specifies the necessary and sufficient conditions for this to occur: (If ∃τ ≥ 0 such that Q − τP ≥ 0) ⇐⇒ (xTPx ≥ 0 ⇒ xTQx ≥ 0) [1]. Applying the S-procedure to our problem, it is sufficient to check the feasibility of the linear matrix inequalities (LMIs)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach for Determination of Neck-Pore Size Distribution of Porous Membranes via Bubble Point Data

Reliable estimation of the porous membranes neck-pore size distribution (NPSD) is the key element in the design and operation of all membrane separation processes. In this paper, a new approach is presented for reliable of NPSD of porous membranes using wet flow-state bubble point test data. For this purpose, a robust method based on the linear regularization theory is developed to extract NPSD...

متن کامل

ارائه یک الگوریتم ابتکاری جدید برای حل مساله مکان‌یابی پوشش کلی

Set covering problem has many applications such as emergency systems, retailers’ facilities, hospitals, radar devices, and military logistics, and it is considered as Np-Hard problems. The goal of set covering problem is to find a subset such that :::::::::union::::::::: of the subset members covered the whole set. In this paper, we present a new heuristic algorithm to solve the set covering pr...

متن کامل

Interactional effects of bubble size, particle size, and collector dosage on bubble loading in column flotation

The success of flotation operation depends upon the thriving interactions of chemical and physical variables. In this work, the effects of particle size, bubble size, and collector dosage on the bubble loading in a continuous flotation column were investigated. In other words, this work was mainly concerned with the evaluation of the true flotation response to the changes in the operating varia...

متن کامل

Computational Simulation of Hydrodynamic Convection in Rising Bubble Under Microgravity Condition

In this work, rising of a single bubble in a quiescent liquid under microgravity condition was simulated. The related unsteady incompressible full Navier-Stokes equations were solved using a conventional finite difference method with a structured staggered grid. The interface was tracked explicitly by connected marker points via hybrid front capturing and tracking method. One field approximatio...

متن کامل

Multiple ellipse fitting by center-based clustering

This paper deals with the multiple ellipse fitting problem based on a given set of data points in a plane. The presumption is that all data points are derived from k ellipses that should be fitted. The problem is solved by means of center-based clustering, where cluster centers are ellipses. If the Mahalanobis distance-like function is introduced in each cluster, then the cluster center is repr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009